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Abstract. We describe a subtle error which can appear in numerical calculations involving 
Qh the spacing statistics of eigenvalues of random unitary matrices. 
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1. Introduction 



The Random Matrix Theory (RMT) of the classical compact groups has become a cen- 
^1 tral tool in analytic number theory By modeling the Riemann zeta-function and other 

L-functions with the characteristic polynomial of a random unitary matrix, new conjectures 
have been made about L-functions, which have also been useful for proving new results. 
One of the reasons for this success is that many quantities which are mysterious in number 
theory can be computed exactly in RMT. Examples include the statistical behavior of the 
zeros [6], [7J [3j [8] , value distribution [U [5] , and moments [1] . 

There are some questions about L-functions for which the RMT analogue has not yet been 
answered. But in many of those cases one can exploit the fact that it is easy to generate 
Haar-distributed random matrices for the unitary, symplectic, and orthogonal groups. See 
Mezzadri [9] for a complete discussion. Thus, one can usually obtain numerical evidence to 
q ■ support a conjecture or to check a calculation. 

In many cases, data generated from relatively small matrices is almost indistinguishable 
from the limiting case of large matrices. For example, the normalized nearest neighbor 
spacing of eigenvalues of 15 x 15 random unitary matrices differs from the limiting case of 
^ large unitary matrices by so little that the difference is not readily visible in a histogram. 

Despite the simplicity of generating random matrices, it turns out there is a subtle bias 
in the way many people generate and perform experiments on random matrices. I have 
committed this error (not realizing it was an error) many times, and it never made a dif- 
ference. But then I tried to check the next-to-leading order term of a fairly involved RMT 
calculation [2J. The numerics refused to confirm the calculation, and it turned out that the 
numerics were wrong. Fortunately, the error is easily avoided once you know about it. 

The next section describes a simple example of the situation in which this error occurs, 
and gives some data to illustrate the problem. In section [3] the underlying cause of the 
problem is explained. 

2. The neighbor spacing of eigenvalues 

The sample problem concerns the neighbor spacing of the eigenvalues of random unitary 
matrices. These spacings are completely understood: we choose this example to illustrate 
what can go wrong in the numerical experiments. 
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Suppose U G U (M) is an M x M unitary matrix. The characteristic polynomial of U is 

M 

(2.1) Au(x)= ]J(x-e i0m ) 

m=l 

where {e 1 ,...,e M } are the eigenvalues of U and — 7r < #i < • • • < < n are the 
eigenangles of [/. Let <5 m = ^{6 m+ i — 6 m ) denote the normalized neighbor spacing of the 
eigenangles, where we set Om+i = 0i + 2n so that 5m is the "wrap around" neighbor spacing. 
Collectively the M numbers 5i, ... ,5m are 1 on average, because 5± + ■ ■ • + 5m = M by 
construction, but as commonly implemented on a computer, the individual 5j are not one 
on average. That is the point of this paper. 

2.1. Generating random matrices. The computer code and data in this paper are from 
Mathematica, but the code should be understandable without in-depth knowledge of Math- 
ematica, and the apparent anomalies in the data have nothing to do with Mathematica. 

The following code defines a function randunitary [M] which generates a Haar-random 
matrix in U(m). The first line loads the Mathematica package required to generate normally 
distributed random numbers. The second line sets up the generation of the normally dis- 
tributed random numbers, and the third command implements the algorithm described by 
Mezzadri 0. 

«St at istics 'NormalDistribution' 

norm = NormalDistribution [0, 1] 

randunitary [M_] : =Block [{gmatrix , q , r , d , dd} , 

gmatrix=Table [Random [norm] +Random [norm] *I , {i , 1 , M} , { j , 1 , M}] ] ; 
{q,r} = QRDecomposition [gmatrix] ; 

d=DiagonalMatrix [Table [dd=r [ [j ;dd/Abs[dd] ,{j,l,M}]]; 
q.d] 

Note that this code already fixes a non-obvious error that is present in the most straight- 
forward way to generate a random unitary matrix. Namely, the matrix q in the above code 
should be a Haar-random matrix in U(M), but it isn't. The problem is that Mathematica's 
QRDecomposition function introduces a bias which causes the matrix q to have fewer eigen- 
values very close to 1. Multiplying by the diagonal matrix d fixes that problem. This issue 
arises in the QRDecomposition routine of every computer algebra package. See Mezzadri [U] 
for details. 

Note also that this code is for Mathematica 5. Some modifications may be needed for 
Mathematica 6 because the functions for generating random numbers have been changed. 

The following code extracts the eigenangles and defines the normalized neighbor spacings. 
In this example we use 14 x 14 matrices. 

M=14; 

mymatrix=randunitary [M] ; 

eigangles= (M/(2 Pi)) Sort [Arg [Eigenvalues [mymatrix] ]] ; 
delta [j_] : =eigenangles [ [j+1] ] - eigenangles [ [j] ] ; 
delta [M] : =eigenangles [ [1] ] - eigenangles [ [M] ] + M; 
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2.2. Problems with the eigenangle spacing. Depending on how you use it, the above 
code has an error. Here is how the error manifests itself. For fixed j, the average (expected) 
value of delta [j] is not equal to 1, even in the large M limit. While it is true that, by con- 
struction, the average of delta [1] , . . . , delta [M] is identically equal to 1, each individual 
delta [j] does not average to 1. Table [27T1 shows the average of delta [j] for various j and 
M, averaging over 100,000 matrices. 



M = 


14 


22 


32 


Si 


0.94345 


0.94597 


0.94506 


S3 


0.99367 


0.99549 


0.99387 


S 7 


0.99912 


0.99836 


1.00045 


Sn 


0.99352 


0.99926 


0.99745 


Srand 


1.00260 


0.99948 


1.00147 



Table 2.1. Averages of Sj for random matrices in U(M) for various j and M. 
Each column is the average for 100,000 matrices. The bottom row averages Sj 
for a randomly chosen j for each matrix. 

As the table shows, at least some of the selected Sj differ significantly from 1. Curiously, 
most are less than 1 on average. The last row, S ran( i, is generated by randomly using one of 
Si, ... , Sm for each matrix, and then averaging those random choices. We know that S ran d 
must average to 1, and the closeness of S ran d to 1 should suggest that (at least some of) the 
other values differ significantly from 1, as opposed to arising from fluctuations due to a small 
sample size. 

2.3. More curiosities in the neighbor spacings. The example in the previous section 
is unrealistic, because nobody would use only one neighbor spacing for each matrix. Even 
if that gave the right answer, it would be much more efficient to use several, or all, of the 
available neighbor spacings. However, it is quite common for people to just choose the "lazy 
person's neighbor spacings" Si, . . . ,Sm-i- That is, throw away the "wrap around" spacing, 
which avoids the need to program that as a special case. One could argue (incorrectly!) that 
since Haar measure is rotationally invariant, all the neighbor spacings are the same, and so 
no bias is introduced by omitting one value. That argument is wrong, as we shall see. Here 
is the average of S±, . . . , Sm-i for the same data set used to generate Table 12.11 



M = 


14 


22 


32 


average of Si, . . . , Sm-i 


0.9862 


0.99157 


0.99419 


average of 5m 


1.1796 


1.1768 


1.17988 



Table 2.2. Average of Si, . . . ,Sm-i, and the average of Sm, for the same 
data set of 100,000 matrices used to generate Table 12.11 

As Table 12721 shows, the average of the "lazy person's neighbor spacings" is less than 1. If 
that is true, then the average of the "wrap around" spacing must be greater than 1, which 
Table [2721 also confirms. 

The fact that Sm is larger than average is not due to an error in the code. It is due to a 
bias caused by the way we choose the eigenangles, which we describe in the next section. 
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3. HOW NOT TO CHOOSE A RANDOM GAP 

Suppose you want to choose a random gap between the eigenangles of a random matrix, 
so that each gap is equally likely. The following procedure is incorrect: choose a random 
point on the unit circle, and then select the gap which includes that selected point. That 
method introduces a bias, because large gaps are more likely to be selected than small gaps. 
For example, if all the eigenvalues were in one-half of the unit circle, the biggest gap would 
be selected at least half the time. That is, the expected size of a gap surrounding any given 
point will be larger than average. 

That bias is implicit in the way the eigenangles have been chosen in the Mathematica 
code above. Because we restrict the eigenangles to the interval [— vr,7r), the "wrap around" 
gap is chosen to include the point —1. Thus, the expected size of 5m should be larger than 
average, as the data shows. A calculation finds that the expected size of a gap selected 
in this way is the mean of the square of the nearest neighbor spacing. For the large M 
limit of eigenangles from U(M), the square of the normalized nearest neighbor spacing is 
approximately 1.180, which agrees well with the data in Table 12721 And since gu is larger 
than average, the rigidity in the eigenangle spacing forces g\ and Qm-i to be smaller than 
average. The effect fades away so that gM/2 is only slightly smaller than average, That is, 
for fixed j, the expected value of Sj is less than 1, although the amount less than 1 is a 
rapidly decreasing function of j. The average of 6i, . . . , Sm-i is approximately 1 — 0.18/M, 
but even that can be a significant difference if one is exploring the dependence on the size 
of the matrix. 

Thus, even if the distribution of points on the unit circle is rotationally invariant, there 
is a bias introduced by the simple procedure of taking the Arg[] of the points and dealing 
instead with numbers in the interval [— 7T, 7r). In particular, the shortcut of leaving out the 
"wrap around" gap can have unintended consequences. 
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